Deep learning supports the differentiation of alcoholic and other-than-alcoholic cirrhosis based on MRI

Although CT and MRI are standard procedures in cirrhosis diagnosis, differentiation of etiology based on imaging is not established. This proof-of-concept study explores the potential of deep learning (DL) to support imaging-based differentiation of the etiology of liver cirrhosis. This retrospective, monocentric study included 465 patients with confirmed diagnosis of (a) alcoholic (n = 221) and (b) other-than-alcoholic (n = 244) cirrhosis. Standard T2-weighted single-slice images at the caudate lobe level were randomly split for training with fivefold cross-validation (85%) and testing (15%), balanced for (a) and (b). After automated upstream liver segmentation, two different ImageNet pre-trained convolutional neural network (CNN) architectures (ResNet50, DenseNet121) were evaluated for classification of alcohol-related versus non-alcohol-related cirrhosis. The highest classification performance on test data was observed for ResNet50 with unfrozen pre-trained parameters, yielding an area under the receiver operating characteristic curve of 0.82 (95% confidence interval (CI) 0.71–0.91) and an accuracy of 0.75 (95% CI 0.64–0.85). An ensemble of both models did not lead to significant improvement in classification performance. This proof-of-principle study shows that deep-learning classifiers have the potential to aid in discriminating liver cirrhosis etiology based on standard MRI.

www.nature.com/scientificreports/ Although it has been described that the micro-and macroscopic appearance of cirrhosis in medical imaging varies to some extent according to the underlying etiology, the use of imaging features as a means to determine the cause of the disease has not been established yet 9,11,12 . In a previous study, a convolutional neural network (CNN) was already shown to be able to detect liver cirrhosis based on standard clinical MRI sequences with expert-level accuracy irrespective of etiology 13 . Therefore, the aim of this proof-of-concept study was to investigate deep learning for standard MRI based characterization of disease etiology, differentiating alcoholic-versus other-than-alcoholic cirrhosis.

Materials and methods
Dataset. The study was approved by the Ethics Committee at the Medical Faculty of the Rheinische Friedrich-Wilhelms-Universität Bonn and the need for written informed consent was waived due to its retrospective, single-center nature. The research was performed in accordance with the Declaration of Helsinki. Patients with confirmed diagnosis of liver cirrhosis, defined by clinical manifestations of liver cirrhosis (e.g. presence of dermal features, ascites, splenomegaly or hyperestrogenism), laboratory parameters (e.g. presence of parameters of hepatocyte damage or impaired hepatic synthesis), and/or histopathological criteria, who underwent liver MRI for diagnostic purposes between 2017 and 2019 at the Department of Diagnostic and Interventional Radiology at the University Hospital of Bonn, were evaluated for inclusion. The clinical information management system of the relevant institution was used to derive clinical characteristics of the study population including the respective cause of liver cirrhosis. Patients with unknown causes of liver cirrhosis and with overlap of alcoholic and other causes were excluded. The final cohort was separated according to the underlying cause of liver cirrhosis into (a) patients with alcoholic liver cirrhosis and (b) other-than-alcoholic liver cirrhosis (Fig. 1).
Image segmentation and classification. All patients underwent a standardized imaging protocol including a standard clinical respiratory triggered multi-slice turbo spin echo sequence with non-cartesian k-space filling (T2 MultiVane XD) on a clinical 1.5 Tesla (Ingenia 1.5 T, Philips Healthcare, Best, the Netherlands) or 3.0 Tesla (Ingenia 3.0 T, Philips Healthcare, Best, the Netherlands) scanner. This sequence was shown to be suitable for deep learning-based detection of liver cirrhosis in a previous study 13 . Similar to the proposed approach for cirrhosis detection, a single cross-sectional image at the level of caudate lobe was exported, followed by liver segmentation performed by a U-net style convolutional neural network (CNN) with ResNet34 as backbone that was developed and validated on a dataset of 713 single slice T2-weighted MRI images 13 . The images were first normalized and image augmentation was applied during training. Supplementary information on imaging parameters and image preprocessing can be found in Supplement S1 and S2.
For imaging development of a classification CNN that differentiates patients with alcoholic liver cirrhosis and other-than-alcoholic liver cirrhosis, data were randomly split into a training (85%) and a hold-out test set www.nature.com/scientificreports/ (15%). Training was performed with fivefold cross-validation. An ensemble of the cross-validated models was applied to the test set. A CNN with residual connections (ResNet50) with ImageNet pre-trained parameters was used, as this established architecture was shown to be suitable for the detection of liver cirrhosis 13,14 . To investigate whether the use of a different pre-trained architecture than ResNet50 or an ensemble of two architectures is beneficial, a CNN with dense connections (DenseNet121) was additionally evaluated, which has fewer trainable parameters and is less computationally intensive compared to ResNet50 15 .
Furthermore, two different training strategies were evaluated for ResNet50 and DenseNet121 in order to examine whether altering the pre-trained parameters of the CNN may impact classification performance. First, both networks were trained with frozen pre-trained parameters of the convolutional layers. In a second subsequent training run, the pre-trained convolutional layers of both networks were unfrozen with descending learning rates from the last to the first layer at several stages. Training was performed with Adam optimization, a cyclical learning rate scheme, and cross-entropy loss function. Supplementary information on the experimental design and hyper-parameters used for training are provided in Supplement S3.
Image regions that were particularly relevant to the classification task were highlighted by generating gradientweighted class activation maps (Grad-CAM) for the test set 16 .  17 were used for statistical analysis. Patient characteristics are expressed as frequencies or means with standard deviation, as appropriate. Classification accuracy (ACC), as well as receiver operating characteristic (ROC) analyses was performed for the cross-validation and the test sets for both studied CNN architectures (ResNet50, DenseNet121) and both training strategies (frozen, unfrozen). For the test set, 95% confidence intervals were determined for ACC and AUC values. ROC and precision-recall curves were generated 18 . Grad-CAM images were visually inspected by one experienced radiologist (A.F.) and highlighted regions were categorized according to their anatomical localization as being predominantly situated in the right liver lobe, the left liver lobe, the portal region, the caudate lobe, or in the image background. Resulting categorical data were compared using either Fisher`s exact test (for a cell count of ≤ 5) or χ 2 test (for a cell count > 5), as appropriate. The two-sided t-Test was used to compare differences between groups regarding continuous variables. P < 0.05 was set as the level of statistical significance.

Classification of liver cirrhosis etiology.
Segmented images of the entire study population were randomly subdivided into a training (N = 396; 174 female; mean age, 60 ± 12 years), and a test set (N = 69; 29 female; mean age, 59 ± 10 years), with training sets being further split for fivefold cross-validation, balanced for patients with alcoholic and non-alcoholic cause of liver cirrhosis.
Trained with frozen parameters, a mean accuracy (ACC) and mean area under the curve (AUC) of 0.69 and 0.78 was observed for ResNet50 and a mean ACC of 0.66 and a mean AUC of 0.78 was observed for DenseNet121 for all 5 validation splits ( Table 2). With unfrozen pre-trained parameters, mean ACC values of 0.74 and 0.71 and mean AUC values of 0.83 and 0.82 were obtained for ResNet50 and DenseNet121 on cross-validated training data, respectively.
On test data, the classification performance of ResNet50 was higher than DenseNet121 when training with unfrozen parameters, however the difference was not statistically significant (Value and 95% CI: ROC and precision-recall curves of the models trained with unfrozen pre-trained parameters are given in Fig. 2 www.nature.com/scientificreports/ Highlighted imaging regions according to Grad-CAM. The decision process to classify liver cirrhosis as being alcohol related or non-alcohol related was further visualized using Grad-CAM analysis for ResNet50 trained with unfrozen pre-trained parameters. According to Grad-CAM analysis, the right liver lobe (alcoholic liver cirrhosis 42%, 14/33; other-than-alcoholic liver cirrhosis 61%, 22/36) and the portal area (alcoholic liver cirrhosis 30%, 10/33; other-than-alcoholic liver cirrhosis 19%, 7/36) were the imaging regions that were most frequently decisive for the classification process in both groups. Thereby, no significant differences regarding distribution of decisive imaging regions between the two patient groups were observed (Table 3). Exemplary images of the Grad-CAM analysis are provided in Fig. 3.

Discussion
The purpose of this study was to investigate whether a deep learning-based analysis can aid in differentiating the etiology of liver cirrhosis based on routine clinical T2-weighted MRI. Acceptable to excellent discriminatory ability was found in distinguishing patients with alcoholic and other-than-alcoholic cirrhosis. In a previous   www.nature.com/scientificreports/ study, a ResNet50 with frozen pre-trained ImageNet parameters was proposed for automatic detection of liver cirrhosis on T2-weighted MRI 13 . The results of our proof-of-concept study extend the findings of this previous report and show that deep learning not only enables the detection of cirrhosis, but can also help in identifying the underlying cause of the disease. Although the ability of the ImageNet pre-trained ResNet50 to discriminate between alcoholic and other-thanalcoholic cirrhosis can be described as excellent 19 , it is inferior to the differentiation of cirrhotic versus noncirrhotic livers 13 . This may be due to less distinctiveness between imaging criteria indicating different causes of the disease compared to image criteria distinguishing a diseased organ from a non-diseased organ. For instance, it has been described that a hypertrophic appearance of the central hepatic parenchyma/caudate lobe is expected in alcohol-related cirrhosis, but also in primary sclerosing cholangitis and Budd-Chiari syndrome related cirrhosis 11 .
Of both models investigated in the current study, ResNet50 showed higher classification performance on test data. However, the performance was not significantly higher compared to Densenet121. Interestingly, for both CNNs, subsequent training with unfrozen pre-trained parameters did not significantly increase classification performance on test data. This may suggest that the extraction capabilities of general imaging features of the convolutional kernels, learned during the pre-training with the ImageNet database, generalize well to T2-weighted MRI images. An ensemble of the two models trained with unfrozen parameters achieved equal accuracy and a slightly higher AUC compared to ResNet50, however, the difference was not statistically significant. Therefore, no clear advantage was observed by using an ensemble of the two different pre-trained ImageNet architectures.
Grad-CAM-analysis indicate that the imaging morphology of the right liver lobe and caudate lobe seem to comprise particularly relevant information for discrimination of alcoholic from other-than-alcoholic liver cirrhosis. This is in line with previous studies, which describe that the right posterior hepatic notch sign, defined as a sharp liver surface indentation at the posterior boundary of the right and caudate lobe, is considered to be particularly prevalent among patients with alcoholic liver cirrhosis 12,20 . As described above, hypertrophies of the caudate lobe and central hepatic areas are more frequently observed in patients with alcohol-related diseases, but are also seen in other etiologies. To the best of our knowledge, there are currently no studies presenting metrics for the diagnostic accuracy of cirrhosis etiology based on such imaging criteria 11,12 . However, a very recent work investigated a radiomics approach that relates imaging features to the etiology of liver cirrhosis, and also achieved promising results 21 . Unlike the deep learning method presented in the current study, the proposed radiomics approach requires manual definition of region of interests. To date, imaging features have not been used in routine clinical practice to identify alcohol as a cause of cirrhosis.  Table 3. Highlighted imaging regions according to gradient-weighted class activation maps (Grad-CAM). Results of the visual inspection of Grad-CAM images classified by ResNet50 are provided. Within each segmented image of the test set, highlighted regions were visually rated as being primary located within the right liver lobe, the left liver lobe, the portal area, the caudate lobe, or within image background by one radiologist experienced in abdominal imaging (A.F.). www.nature.com/scientificreports/ In clinical routine, liver cirrhosis is typically diagnosed by a combination of characteristic clinical and imaging findings, corresponding laboratory testing and ancillary examinations such as abdominal sonography. Thereby, while this work-up is usually straightforward for virus-related cirrhosis, it may be much more effortful in patients with alcohol-related disease, which in many cases may be diagnosed only by exclusion since there are no specific laboratory findings 22 . Liver biopsy is recommended if cirrhosis etiology is uncertain, but is limited due to its invasive nature, inter-observer variability and potential sampling error 2,23 . Moreover, cirrhosis-related parenchymal changes may hamper or even preclude correct histological analysis 2 .

Alcoholic liver cirrhosis (N = 33) Other-than-alcoholic liver cirrhosis (N = 36) P value
Compensated liver cirrhosis is frequently asymptomatic; thus, it may be assumed that many patients who undergo routine clinical MRI for other indications may be unaware of a concomitant liver disease. In these patients, a pipeline that automatically identifies tissue alterations and can classify possible disease etiologies has the potential to better guide diagnostic pathways and thus initiate a specific therapy earlier. With the help of deep learning algorithms simple cross-sectional imaging modalities could serve as imaging-based biomarkers for the classification of liver disease in the future. Particularly in alcoholic liver cirrhosis, timely and correct identification of the underlying etiology is crucial, as early abstinence was demonstrated to be the key determinant of longterm outcome 8,24 . Sole clinical assessment of alcoholic liver disease alone might not be trivial in clinical practice because it mostly relies on patients' self-report. In this regard, deep learning applications have the potential to aid diagnosis by extracting also relevant information that may not be readily apparent to the human eye.
Our study has several limitations. The algorithm was developed for binary classification only and does currently not support differentiation of various non-alcohol-related cirrhosis etiologies. Due to the limited number of patients within the respective subclasses and to ensure collectives of comparable size for classification, we decided to pool patients with other-than-alcoholic cirrhosis. Future studies with larger samples of the respective subgroups are needed to substantiate the findings from this proof-of-concept study and to expand its application. The clinical benefit would also be significantly increased by an extension to other etiologies. Especially, NAFLD is becoming the main cause of chronic liver disease in many countries and the detection of metabolic related cirrhosis on cross-sectional imaging should be further explored in future studies. Also, we were not able to analyze possible coexisting etiologies of liver cirrhosis in our explorative analysis, as detailed data on additional risk factors were not available due to the retrospective study design. However, future studies should evaluate the ability of deep learning methods to differentiate overlapping liver disease, such as both alcoholic and non-alcoholic steatohepatitis (BASH). Moreover, we exclusively used single-slice T2-weighted images of segmented livers and ImageNet pre-trained models, as these have been shown to be suitable for the detection of liver cirrhosis in a Figure 3. Exemplary images from the study population. ResNet50 trained with unfrozen pre-trained parameters was used for the classification task. Exemplary patients from the test set are provided and imaging regions that were particularly relevant to the classification task are highlighted using the gradient-weighted class activation maps (Grad-CAM) method. Panels A1, B1, C1 provide exemplary patients from the test set with alcoholic liver cirrhosis. In panels A2, B2, C2, images of exemplary patients from the test set with other-thanalcoholic liver cirrhosis are presented. In panels A1, B1, A2, B2, regions within the right liver lobe appeared to be particularly relevant for the classification task, as indicated by Grad-CAM images. In panels C1 and C2, the portal liver region appeared to be most decisive for classification. www.nature.com/scientificreports/ previous study 13 . Future studies may also address a three-dimensional approach accounting also for extrahepatic manifestations in cirrhotic patients or the use of other imaging sequences for differentiation of etiologies. In summary, the results of this proof-of-principle study demonstrate that discrimination between alcoholic and other-than-alcoholic cirrhosis based on clinical T2-weighted single-slice images is feasible with acceptable to excellent discrimination ability. This indicates the potential of deep learning for a more comprehensive assessment of diffuse liver disease.

Data availability
The data sets analysed in this study are subject to data protection law and are therefore not publicly available.